Robust Multichannel Gender Classification from Speech in Movie Audio

نویسندگان

Naveen Kumar

Md. Nasir

Panayiotis G. Georgiou

Shrikanth S. Narayanan

چکیده

Speech in the form of scripted dialogues forms an important part of the audio signal in movies. However, it is often masked by background audio signals such as music, ambient noise or background chatter. These background sounds make even otherwise simple tasks, such as gender classification, challenging. Additionally, the variability in this noise across movies renders standard approaches to source separation or enhancement inadequate. Instead, we exploit multichannel information present in different language channels (English, Spanish, French) for each movie to improve the robustness of our gender classification system. We exploit the fact that the speaker labels of interest in this case co-occur in each language channel. We fuse the predictions obtained for each channel using Recognition Output Voting Error Reduction (ROVER) and show that this approach improves the gender accuracy by 7% absolute (11% relative) compared to the best independent prediction on any single channel. In the case of surround movies, we further investigate fusion of mono audio and front center channels which shows 5% and 3% absolute (8% and 4% relative) increase in accuracy compared to only using mono and front center channel, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

Practical Considerations for Real-Time Implementation of Speech-Based Gender Detection

This paper describes a detailed analysis and implementation of a robust gender detector for audio stream applications. The implementation, based on melcepstral features and a Gaussian mixture model classifier, is designed to maximize gender classification performance in continuous speech. The described detector outperforms other reported systems based on statistically significant numbers of gen...

متن کامل

Binaural cue coding-Part I: psychoacoustic fundamentals and design principles

Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternati...

متن کامل

razer Maelstrom audio enGine

The evolution oF audio Unlike video technologies, which have seen a stream of new innovations over the years (color screens, ever improving resolutions, brighter LCD and plasma screens, various 3D vision enhancements for home and cinema, etc), stereo has been, since 1931, the predominant technology used for audio reproduction. 1877 – Monophonic sound reproduction is created and the phonograph i...

متن کامل

Multimedia classification of movie shots using low-level and semantic features

Movie shots categorization may be approached by using audio and visual features for inferring high-level information about a movie shot. Low-level audio and visual features such as color and MFCC and mid-level features such as sky and speech detection have been used in multimedia understanding research. However, integrating all this features in a classifier remains a subject of study. In this p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Robust Multichannel Gender Classification from Speech in Movie Audio

نویسندگان

چکیده

منابع مشابه

A Comparative Study of Gender and Age Classification in Speech Signals

Practical Considerations for Real-Time Implementation of Speech-Based Gender Detection

Binaural cue coding-Part I: psychoacoustic fundamentals and design principles

razer Maelstrom audio enGine

Multimedia classification of movie shots using low-level and semantic features

عنوان ژورنال:

اشتراک گذاری